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WEB-BASED ELECTRONIC MAIL SERVICE 
APPARATUS AND METHOD USING FULL 
TEXT AND LABEL INDEXING 

FIELD OF THE INVENTION 

The present invention relates generally to electronic mail, 
and more particularly to electronic mail messaging in a 
distributed computer system. 

BACKGROUND OF THE INVENTION 

With the advent of large scale distributed computer sys- 
tems such as the Internet, the amount of information which 
has become available to users of computer systems has 
exploded. Among this information is electronic mail 
(e-mail). With the improvements in means for composing 
and distributing written messages, the amount of e-mail 
traffic on the Internet has surged. It is not unusual for an 
active Internet user to be exposed to tens of thousands of 
e-mail messages a year. 

As an advantage, the Internet allows users to interchange 
useful information in a timely and convenient manner. 
However, keeping track of this huge amount of information 
has become a problem. As an additional advantage, the 
Internet now allows users to exchange information in a 
number of different presentation modalities, such as text, 
audio, and still and moving images. Adapting e-mail systems 
to organize such complex information, and providing effi- 
cient means to coherently retrieve the information is not 
trivial. 

As a disadvantage, Internet users may receive junk-mail 
whenever they send to mailing lists or engage in news 
groups. There are numerous reported incidents where spe- 
cific users have been overwhelmed by thousands of 
unwanted mail messages. Current filtering systems are inad- 
equate to deal with ibis deluge. 

Known distributed systems for composing and accessing 
e-mail are typically built around protocols such as IMAP, 
POP, or SMTP. Typically, users must install compatible user 
agent software on any client computers where the mail 
service is going to be accessed. Often, a significant amount 
of state information is maintained in the users' client com- 
puters. For example, it is not unusual to store the entire mail 
database for a particular user in his desk-top or lap-top 
computer. Normally, the users explicitly organize mail mes- 
sages into subject folders. Accessing mail generally involves 
shipping entire messages over the network to the client 
computer. 

Such systems are deficient in a number of ways. Most 
computers that a user will encounter will not be configured 
with user agents compatible with the user's mail service. 
Often, a user's state is captured in a specific client computer 
which means that work cannot proceed when the user moves 
to another computer. Managing large quantities of archival 
mail messages by an explicit folder organization is difficult 
for most users. Accessing mail over a low bandwidth net- 
work tends to be unsatisfactory. 

Therefore, it is desired to provide a mail system that 
overcomes these deficiencies. 

SUMMARY OF THE INVENTION 

Provided is a distributed mail system where a plurality of 
client computers are connected to each other and a mail 
service system via a network. Each client computer is 
configured to execute client mail application programs. The 
mail service system is for executing server mail programs on 
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server computers. The mail service system includes an index 
server for storing mail messages in message files, and for 
storing a full-text index of the mail messages. In addition, 
the system includes means for accessing the mail messages 
5 by the plurality of client computers by searching the full -text 
index using queries. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a block diagram of an arrangement of a 
10 distributed mail service system which uses the invention; 
FIG. 2 is a block diagram of a mail service system of the 
arrangement of FIG. 1; 

FIG. 3 is a block diagram of an account manager and 
15 account records of the system of FIG. 2; 

FIG. 4 is a block diagram of message and log files 
maintained by the system of FIG. 2; 

FIG. 5 is a flow diagram of a parsing scheme used for mail 
messages processed by the system of FIG. 2; 
20 FIG. 6 is a block diagram of a full-text index for the 
message files of FIG. 4; 
FIG. 7 is a diagram of a labeled message; 
FIG. 8 is a diagram of an address book entry; 
25 FIG. 9 is a flow diagram for filtering queries; and 
FIG. 10 is a block diagram for a MIME filter. 

, DETAILED DESCRIPTION OF THE 
PREFERRED EMBODIMENT 

30 System Overview 

In FIG. 1, an arrangement 100 provides a distributed mail 
service having features according to the invention. In FIG. 
1, one or more client computers 111-113 are connected via 
35 a network 120 to a mail service system 200 described in 
greater detail below. 

Client Computers 

The client computers 111-113 can be workstations, per- 

40 sonal computers, lap-tops, palm-tops, network computers 
(NCs), or any other similar configured computer system. 
The clients 111-113 can be owned, borrowed, or rented. It 
should be noted that in practice, the clients 111-113 can 
potentially be any of the millions of personal computer 

45 systems that are currently extant and connected to a network. 
Over time, a user may use different client computers at 
different locations. 

As shown for computer 111, each client computer 
executes standard operating system software (O/S) 114, e.g., 

50 UNIX (™), Windows95 ( ™), MacOS (™) or NT (™). 'ITie 
O/S 114 is used to execute application software programs. 
One of the application programs which can execute on the 
client 110 is a Web browser 115. The Web browser 115 can 
be Netscape(™) Navigator(™), Microsoft(™) Explorer(™), 

55 Hot Java(™), and other similar browsers. 

The functionality of the browser 115 can be extended by 
forms, applets, and plug-ins generally indicated by reference 
numeral 116. In the preferred embodiment, the browser 
extensions are in the form of client mail application pro- 

60 grams described in greater detail below. The client mail 
application programs are downloaded over the network 120 
from the mail service system 200. The extensions can be 
implemented using HTML, JavaScript, Java applets, 
Microsoft ActiveX, or combinations thereof to provided 

65 maximum portability. 

As shown for computer 112, the client includes one or 
more processors (P) 117, memories 118 (M), input/output 
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interfaces (I/O) 119 connected to each other by a bus 120. performed via a proxy server, not shown, using secure 

The processors 117 can implement CISC or RISC architec- protocols such SSL and X.509 certificates, 
hires in 32, 64, or other bit length data structures. The 

memories 118 can include solid state dynamic random Ma 'l Service System 

access memory (DRAM) and fixed and removable memo- 5 The majl xM(x m 2Q0 can fae ; lememed ^ one 

nes such as hard disk drives, CD-ROMs diskettes, and of more mwf uten5 connected , 0 each other either 

tapes. The I/O 119 can be connected to input devices such as , oca)1 of Qyer , geographies . A ^ the 

a keyboard and a mouse and output devices such as a name , ,. - & confl d t0 execute software 

display and a printer. The I/O 119 can also be configured to ams on behalf of client compulers m _ 113 . 

connect to multi-media devices such as sound-cards, image to Sometimes> lhe tenn « server -. can mean the hardware) lhe 

processors, and the like. The I/O also provides the necessary softwarei or both because lhe software programs may 

communications links to the network 120. dynamically be assigned to different servers computers 

In the preferred embodiment, the network 120 includes a depending on load conditions. Servers typically maintain 

large number of public access points, and communications i ar g e centralized data repositories for many users, 

are carried out using Internet Protocols (IP). Internet proto- « ,„ |he majl (em 200 ^ servers are eo „ n ^ l0 

cols are widely recognized as a standard way commmu- maintain usef accounlSi |Q recei filter> and Q ize m>n 

eating data. Higher level protocols such as HTTP and FTP m s M lha , , h can readil be , oca(ed ami re , rieved) 

communicate at the application layer, while lower level DQ maUer how ^ information in , he m s k encoded 
protocols, such as TCP/IP operate at the transport and 

network levels. 20 General Operation 

Part of the Internet includes a data exchange interface _ . _ , 

called the World-Wide-Web, or the "Web" for short. The ,. Duru, 8 operation of the arrangement 100, users of the 

Web provides a way for formatting, communicating, inter- c J* cm computers 1U-U2 des.re to perform e-ma.l service^ 

connecting, and addressing data according to standards These activities typically include composing, reading, and 

recognized by a large number of software packages. For 25 organizing e-mail messages Therefore, the client computers 

example, using the Web, multi-media (text, audio, and "n make connections to net work 120 using a pubhc 

video) data can be arranged as Web pages. The Web pages ' nter _ net , ? T e ™ c A e , P rov ! de , r ( ISI ? sucn as AT&T(™) or 

can be located by the browser 115 using Uniform Resource Earthlink(™) Alternatively, a client computer can be con- 

Locators (URLs) nected to the Internet at a cyber-cafe such as Cybersmith 
. " , . „, , , ,30 (™), or the intranet itself via a local area network. Many 

A URL specifies the exact location of a Web-based Qther mec hanisms can also be used. Once a 
resource such as a server or data record. Hie location can 
include domain, server, user, file, and record information, 
e.g., HTrP://www.digital.com/~userd/file.html/~record" An 

Internet service can be used to send and receive mail „ * an ^vantage , structural and functional charactenstics 

messages. For example, a mail message can be sent mail to 35 °[ tbe arrangement 100 include the following Mail services 

the address «jones@mail. digital.com" using the SMTP of the system 200 are available through any Web-connected 

protocol. As an advantage, the Internet and the Web allow ch ™\ computer. The usen> of the services can be totally 

users, with only minor practical limitations, to exchange mobile moving among different clients at will during any of 

data no matter where they are using any type of computer 4n the m f actmtl , es ' Composition of a mail message can be 

equipment started on one client, completed on another, and sent from a 

yet another computer. 

Intranet These characteristics are attained, in part, by never lock- 

The mail service system 200 includes one or more server ^ a ^ s stale ™ 0De of the clienl computers in case 

computers. Usually, the system 200 is part of some private 45 access 15 not be P 0SSlble at a laler tune - Thls nas lhe added 

network (intranet) connected to the public network 120. benefil lhal a chent computer's local storage does not need 

Typically, an intranet is a distributed computer system t0 be ba cked-up because none of the important data reside 

operated by some private entity for a selected user base, for there ' . Io e p e r nce ' thl * 15 based on the notion that the 

example, a corporate network, a government network, or operating platform is the Web, thus access to mail service 

some commercial network. 50 s y stem via lhe Web 18 sufficient to access user data. 

The service system will work adequately over a wide 

Firewall range of connectivity bandwidths, even for mail messages 

In order to provide security protection, communications including data in the form of multi-media. Message retrieval 

between components of the Internet and the intranet are from a lar 6 e repository is done using queries of full-text 

frequently filtered and controlled by a firewall 130. The 55 index wlthout re 9 uin ng a complex classification scheme, 

purpose of the firewall 130 is to enforce security policies of The arrangement 100 is designed to incorporate redun- 

the private intranet. One such policy may be "never allow a dancy techniques such as multiple access paths, and repli- 

client computer to directly connect to an intranet server via cated files using redundant arrays of independent disks 

the public portion of the Internet." The firewall, in parts, (RAID) technologies, 

protects accesses to critical resources (servers and data) of 60 

the intranet. Mai1 Service s V stem 

Only certain types of data traffic are allowed to cross the As shown in FIG. 2, the mail service system includes the 
firewall 130. Penetration of the firewall 130 is achieved by following components. The system 200 is constructed to 
a runnel 131. The tunnel 131 typically performs a secure have as a front-end a Web server 210. The server 210 can be 
challenge- and -response sequence before access is allowed. 65 the "Apache" Web server available from the WWW Con- 
Once the identity of a user of a client has been authenticated, sortium. The Web server 210 interacts with a back-end 
the communications with components of the intranet are common gateway interface (CGI) programs 220. The pro- 



connection has been made, a user can perform any mail 
service. 
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grams interface with an account manager 300, a STMP mail 116 of FIG. 1. Saved composition states 380 allow a user to 

server 240, and an index server 250. The CGI programs 220 compose and send a message using several different client 

are one possible mechanism. The programs could also be computers while preparing the message, 

implemented by adding the code directly to the Web server The account manager 300 can generate a new account, or 

210, or by adding extensions to the NSAPI from Netscape 5 delete an existing account. The account is generated for a 

(™)' user by specifying the user name and password. Once a 

The top-level functions of the system 200 include send skeletal account has been generated, the user can supply the 

mail 241, receive mail 242, query index 243, add/remove remaining information such as labels, named queries, filter 

label to/from mail 244, and retrieve mail 245. Different queries, and so forth, 

servers can be used for the processes which implement the io 

functions 241-245. Mail Server 

The account manager 300 maintains account information Now ^ continued reference t0 pjQ 2 , the mail server 

rhe mail server 240 is used to send and receive mail m 20Q feceives (242) Qew ^ m fa commu _ 

messages to and from other servers connected to the net- Qicati wim ^ mai , >MVOr 24Q us{ (he pop _ 3 q1 

work. IHe index server -250 maintains mail messages in Mail messages are sent (241) using the SMTP protocol. The 

message mes 4UU and a tuli-text index 5UU to messages, he appropriale routine inforniation 

in the mail server 240 for a 

CGI Programs 220 also interact with the messages files 400 ".^ u$er ^ aled after ^ ^ accQum ha& 

via a filter 280 for mail message retrieval. been generated. A "POP Account Name" should be specified 

rhe Web server 210 can be any standard Web server that as the name In mosl syslem s, the name will be case 

implements the appropriate protocols to communicate via sensitive. The "POP Host 1 ' should be the Internet domain 

the network using HTTP protocols 201, for example the name of the raail ser ver 240. Here, the case of the letters is 

Apache server. The CGI back-end programs 220 route ignored. An IP address such as "16.4.0.16" can be used, 

transactions between the Web server 210 and the operational although the domain name is preferred. In some cases, a 

components of the mail service system. The CGI back-end particular user's preferred Internet e-mail address may be 

220 can be implemented as C and TCL programs executing unrelated to the POP Account Name, or the POP Host. The 

on the servers. mau < ^^cr 240 is connected to the Internet by link 249. 

Account Manager ^ ne ra pid expansion in the amount of information which 

is now available on-line has made it much more difficult to 

As shown in FIG. 3, the account manager 300 maintains 3Q locate pcnincni information. The question "in which folder 

account information 301-303 for users who are allowed to did , store that meS sage?," becomes more difficult to answer 

have access to the mail system 200. Information maintained if tne num ber of messages that one would like to save 

for each account can include: mail-box address 310, e.g., in increases over long time periods to many thousands. The 

the form of a Post Office Protocol (POP-3) address, user importance and frequency of accessed messages can vary, 

password 320, label state 330, named queries 340, filter „ tf r „ „ , . . . A * 

_ ^ cft ' „, „ - t - :'f n m nf :„„\g:n ' f _ 35 Traditionally, the solution has been to structure the mail 

queries 35U, query position information 360, user prefer- v. ... at ~ , , , 

j j * ion thT c ii messages in a hierarchical manner, e.g., files, folders, sub- 

ences 370, and saved composition states 380. The full e 6 . . c . „ . , 

. c t . t • c ♦ • n u folders, sub-sub-folders, etc. However, it has been recog- 

me an ing and use of the account information will be come . . ' , . . , ' . , ., . 

~ c nized that such structures do not scale easily because filing 

apparent as other components of the system 200 are ... , • . . KM c ^ *u * 

described strategies are not consistent over time. Many users find that 

1 . 40 hierarchical structures are inadequate for substantial quan- 

As an introduction passwords 320 are used to aulhenti- tities of e . mail messages accumu lated over many years, 

cate users. Labels 330 are used to organize and retrieve mail Parlicularl y 7 since the meaning and relation of messages 

messages. Labels can be likened to annotated notes that can chang e S over time. Most systems with an explicit filing 

be added and removed to messages over their lifetimes, in s ife constam and tcdious aUemion tQ k the 

other words labels are mutable. Labels help users organize 45 hierarchical ordering consistent with current needs, 
their messages into subject areas. At any one time, the label 

state captures all labels that are active for a particular user. Message Repository 
Labels will be described in greater detail below. 

In the system 200, mail messages are accessed by using Messages are stored in message files 400 and a full-text 

queries. This is in contrast to explicitly specifying subject 50 mdex * llie or g amzatlon of the message files is first 

folders as are used in many known mail systems. A query is described. This is followed by a description of the full-text 

composed one or more search terras, perhaps connected by index 500 - M a feature of lhe P resent invention, user 

logical operators, that can be used to retrieve messages. By interaction with the mail messages is primarily by queries 

specifying the name of a query, a user can easily retrieve performed on the full-text index 500. 

messages related to a particular topic, phrase, date, sender, 55 As shown in FIG. 4, the index server 250 assigns each 

etc. Named queries 340 are stored as part of the account received message 401-402, a unique identification (MsgID) 

information. 410. The MsgID 410 is composed of a file identification 

Some queries can be designated as "filter" queries 340. (FilelD) 411, and a message number (MsgNum) 412. The 

This allows a user to screen, for example, "junk mail," FilelD "names," or is a pointer to a specific message file 420, 

commonly known as spam. Filter queries can also be used 60 an d me MsgNum is some arbitrary numbering of messages 

to pre-sort messages received from particular mailing lists. in a file, e.g., an index into the file 420. 

Query position information records which message the user A message never changes after it has been filed. Also, the 

last selected with a query. This way the user interface can MsgID 410 forever identifies the same message, and is the 

position the display of messages with respect to the selected only ID for the message. In the referenced message file 240, 

message when the query is reissued. User preferences 370 65 a message entry 430 includes the MsgNum stored at field 

specify the appearance and functioning of the user interface 431, labels 432, and the content of the message itself in field 

to the mail service as implemented by the extended browser 433. 
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The number of separate files 240 that are maintained for 
storing messages can depend on the design of the underlying 
file system and specific implementation details. For 
example, the size and number of entries of a particular file 
may be limited by the file system. Also, having multiple files 5 
may facilitate file maintenance functions such as back-up 
and restore. 

Label Log 

Although a message may never change, the set of labels 
associated with a message may change. Because labels can 10 
change, a transaction log 440 is also maintained. The log 440 
includes "add" entries (+label) 450, and "remove" entries 
(-label) 460. Each entry includes the MsgID 451 or 453 of 
the effected message entry, and label that is being added 
(452) or deleted (453). The contents of the log 440 are 15 
occasionally merged with the message files 240. Merged 
entries are removed from the log 440. The label log 440 
allows for the mutation of labels attached to data records 
such as mail messages, where the labels and the data which 
are labeled are stored in the same index. 20 

Full-Text Index 

FIGS. 5 and 6 show how the index server 250 generates 
the full-text index 500. Newly received mail messages are 
processed in batches 403-404. Messages 401 and 402 of a 25 
batch are parsed into individual words 510. A batch 403 in 
a large mail service system may include hundreds or thou- 
sands of messages. The words of the messages are parsed in 
the order that they are received in a batch. Each word is 
arbitrarily assigned a sequential location number 520. 30 

For example, the very first word of the very first message 
of the very first batch is assigned location "1," the next word 
location "2," and the last word location "3." The first word 
of the next message is assigned the next sequential location 
"4," and so forth. Once a location has been assigned to a 35 
word, the assignment never changes. If the location is 
expressed as a 64 bit number, then it is extremely unlikely 
that there will ever be an overlap on locations. 

As the messages are parsed, the indexing process gener- 
ates additional "metawords" 530. For example, an end-of- 40 
message (eom) me ta word is generated for the last word of 
each message. The metawords are assigned the same loca- 
tions as the words which triggered their generation. In the 
example shown, the location of the first eom melaword is 
"3," and the second is "5." 45 

Other parts of the message, such as the "To," "From," 
"Subject," and "Date" fields may generate other distinctive 
metawords to help organize the full-text index 500. Meta- 
words help facilitate searches of the index. Metawords are 
appended with predetermined characters so that there is no 50 
chance that a metaword will ever be confused with an actual 
parsed word. For example, metawords include characters 
such as "space" which are never allowed in words. 
Hereinafter, the term "words" means both actual words and 
synthesized metawords. 55 

After a batch of messages have been parsed, the words 
and their assigned locations are sorted 540, first according to 
the collating order of the words, and second according their 
sequential locations. For example, the word "me" appears at 
locations "3" and "5" as shown in box 550. The sorted batch 60 
550 of words and locations is used to generate the index. 
Each sorted batch 550 is merged into the index 500, initially 
empty. 

Index Structure 

FIG. 6 shows the logical structure of an index 600 
according to the preferred embodiment. The index includes 
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a plurality of word entries 610. Each word entry 610 is 
associated with a unique "word," that appeared at least once 
in some indexed message. The term "word" is used very 
loosely here, since the parsing of the words in practice 
depends on which marks/characters are used as word sepa- 
rators. Words do not need to be real words that can be found 
in a dictionary. Separators can be spacing and punctuation 
marks. 

The indexer 250 will parse anything in a message that can 
be identified as a distinct set of characters delineated by 
word separators. Dates are also parsed and placed in the 
index. Dates are indexed so that searches on date ranges are 
possible. In an active index there may well be millions of 
different words. Therefore, in actual practice, compression 
techniques are extensively used to keep the files to a 
reasonably size, and allow updating of the index 500 as it is 
being used. The details of the physical on-disk structure of 
the index 600, and the maintenance thereof are described in 
U.S. patent application Ser. No. 08/696,060, "Web index," 
filed by M. Burrows 00 Aug. 9, 1996, incorporated in its 
entirety herein by reference. 

The word entries 610 are stored in the collating order of 
the words. The word is stored in a word field 611 of the entry 
610. The word field 611 is followed by location fields (Iocs) 
612. There is one location field 612 for every occurrence of 
the word 611. As described in the Burrows reference, the 
locations are actually stored as a sequence of delta-values to 
reduce storage. The index 600 is fully populated. This means 
the last byte 614 of the last location field of a word is 
immediately followed by the first byte 615 of the next word 
field. 

Labels 

Labels provide a way for users to annotate mail messages. 
Attaching a label to a message is similar to affixing a note 
to a printed document. Labels can be used to replace the 
folder mechanisms used by many prior art mail systems. 
However, a single mail message can be annotated with 
multiple labels. This compares favorably to folder-based 
systems where a message can only be stored in a single 
folder. 

Users can define a set of labels with which to work. The 
labels are nothing more than predefined text strings. The 
currently active set of labels for a particular user, e.g. the 
label state 330 of FIG. 3, is maintained by the account 
manager 300 and is displayed in a window of the graphical 
user interface. Labels can be added and removed by the 
system or by users. 

As shown in FIG. 6, labels are stored in a data structure 
650 that parallels and extends the functionality of full-text 
index 500. Labels are subject to the same constraints as 
index words. Also, queries on the full-text index 500 can 
contain labels, as well as words, as search terms. A label is 
added to a mail message by adding a specific index location 
(or locations) within the message to the set of locations 
referred to by the specified label. Label removal is the 
opposite. Operations on labels are much more efficient than 
other operations that mutate the state of the full-text index. 

The on-disk data structure for the label index 650 that 
represents the label state 320 is the same as that described 
for index word entries 600. This means that the label state 
can be thought of as an extension of the full-text index 500. 
Accordingly, the label index extension, like the index 500, 
maps labels (words) 651 to sequences of index locations 
652. 

Although the structural formats of the label extension 650 
and the mil-text index 500 are the same, for efficiency 
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reasons, the label portion of the index is managed by a During normal operation, the CGI program 220 modifies 

software component that is distinct from the software that each issued query by appending a term which excludes the 

manages the full-text index 500. If a term of a query string "deleted*' label, e.g., "and not deleted." This has the effect of 

is found to be a label, then the label index 650 is searched hiding all deleted messages from the user of the client. There 

to provide the necessary location mapping. This mapping is 5 j s an option in the user interface which inhibits this effect to 

further modified by the label log 440 that contains all recent makc deleted messages visible, 
label mutations (additions or removals). The label log 440 

can include an in-memory version 660. Since operations on Named Queries 

this structure arc in-memory, updates for recent label muta- _ , . . 

tions 660 can be relatively fast while the updating of the , n , Q " enes can be named. Named queries are maintained 

label index 650 can take place in background. 10 b * tne account mana ^ r 30 °; ^ s P ec ^»ng the name of a 

As shown in FIG. 7, a message 700 includes a header 701 q^ry, users can quickly perform a search for e-mail mes- 

and a body. The header 701 Typically includes the "To", ^ages including frequently used terms. Users can compose 

"From", "Date" and "Subject" fields. The header may also COm P lex ^ S l ° match . ° n *>™ ^ in indexed 

include routing information. The body 702 is the text of the 35 ™ essa & es > P e * a P s intermixing conditions about messages 

mail messa e having particular text or labels, and to keep the query for 

mai message. subsequent use. 

Each mail message can initially receive two labels, VT , , _ , . 

"inbox" 710 and "unread" 720. Messages labeled as Named queries can be viewed as a way for replacing prior 

"unread" 720 have not yet been exposed for reading. Mes- art su f b £ ct foldersJnstead of statically organizing messages 

sages with the "inbox" label 710 are deemed to require the 20 fol L ders accordlQ S t0 predetermined conditions, queries 

user's attention. As will be described below, it is possible for aUow * he user t0 retneve a ^scific collection of messages 

messages to be labeled as unread but not have the inbox depending on a current set of search terms. In other woros, 

label. These less important messages can be read by the user tne conditions which define the collection are dynamically 

as needed. expressed as a query. 

Outputting, e.g., displaying or printing, a message 25 History List 
removes the unread label 720 under the assumption that it 

has been read. A user can explicitly add or remove the Recentl y Panned queries are kept in a "history" list, 

unread label. A message can be deleted by attaching a Accordingly, frequently performed queries can readily be 

"delete" label 730. This has the effect that the message will ^-issued, for example, when the index has been changed 

not been seen again because messages labeled as deleted are 30 because of newl ? received mai1 ' or because of actions taken 

normally excluded during searches. Removing the deleted b y olner clienl computers, 

label has the effect of "un-deleting" a message. Dynamic Address Book 

Although a preferred embodiment uses labels for data 

records that are mail messages, it should be understood that Queries can also be used to perform the function of prior 

"mutable" labels can also be used for other types of data 35 arl "address books " In many known e-mail systems, users 

records. For example, labels which can be added and keep address books of frequently used addresses. From time 

removed can be used with data records such as Web-pages, 10 lirae > users can add and remove addresses. There, the 

or news group notes. The key feature here being that labels address books are statically maintained as separate data 

are indexed in the same index as the record which they label, structures or address book files. For example, there can be 

and that labels can be added and removed. ' 40 "personal" and "public" related address books. In contrast, 

here, there is no separately stored address book. Instead, an 

Queries "address book" is dynamically generated as it is needed. The 

After e-mail messages have been indexed and labeled, the dynamic address book is generated from the files 400 and the 

messages can be retrieved by issuing full-text queries. A full-text index 500 as follows. 

query searches for messages that match on words and labels 45 As shown in FIG. 8, a user of a client computer 820 can 

specified in the query. This is in contrast with known mail generate address book type information using a form 800 

systems where users access mail by remembering in which supplied by one of the client mail application programs 116. 

file, folder, or sub-folder messages have been placed so the The form 800 includes, for example, entry fields 801-803 

folder can be searched. As an advantage of the present for address related information such as name, phone number, 

system, users only need to recall some words and labels to 50 (hard-copy) mail address, and (soft-copy) e-mail address, 

find matching messages. and so forth. Alternatively, address information can be 

The syntax of the query language is similar as described selected from a prior received mail message 805 by clicking 

in the Burrows reference. A query includes a sequence of on appropriate fields in the header or body of the message 

primitive query terras, combined by operators such as "and," 805. 

"or," "not," "near," and so forth. A primitive term can be a 55 From the perspective of the mail service system 200 and 

sequence of alpha-numeric characters, i.e., a "word," with- the index server 250, the address book information is 

out punctuation marks. If the terms are enclosed by quota- handled exactly as a received mail message. This means that, 

tion marks ("), the search is for an exact match on the quoted for example, the data of the fields 801-803 are combined 

string. A term can be a label. A term such as "frormfred" into an "address book" mail message 810. An "address" 

searches for messages with the word "fred" in the "from" 60 label 809 can also be added to the entry using the labeling 

field of a message header. Similar queries can be formulated convention as described herein. The address book mail 

for the "to," "from, ""cc," and "subject" fields of the header. message 810 and label 809 can be stored in one of the 

A term such as "1 1/2/96- 25/Dec/96" searches for all message files 400. Additionally the message 810 can be 

messages in the specified date range. The parsing of dates is parsed and inserted into the full-text index 500 as are the 

flexible, e.g, 12/25/96, 25/12/96, and Dec/25/96 all mean the 65 words and labels of any other mail message. In other words, 

same date. In the case of ambiguity (2/1/96) the European the address information of form 800 is merged and blended 

order (day/month) is assumed. with the full-text index 500. 
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After the address information has been filed and indexed, 
the address information can be retrieved by the user of the 
client computer 820 composing a query 830 using the 
standard query interface, with perhaps, the label "address" 
as one of the query terms. The exact content to be retrieved 
is determined at the time that the terms and operator of the 
query 830 arc composed by the user. The address 
information, i.e., one or more address book mail messages, 
which satisfies the query is returned to the client computer 
820 as the dynamic address book 840. The user can then 
select one of the addresses as a "to" address for a new, reply, 
or forward mail message. 

Message Resemblance 

It is also possible to search for messages which resemble 
a currently selected message. In this case a document 
resemblance technique can be used. Such a technique is 
described in U.S. patent application Ser. No. 08/665,709, 
Method for Determining Resemblance of Documents, filed 
by Broder et al. on Jun. 16, 1996, incorporated in its entirety 
herein by reference. This allows a user to find all messages 
which closely relate to each other. 

Sorting Search Results 

When a search for an issued query completes, the results 
of the search are presented in an order according to their 
MessagelD 411, FIG. 4. In practice, this means that quali- 
fying messages are presented in the temporal order of when 
the messages were received. 

Most prior art e-mail systems allow other sort orders, such 
as by sender, or by message thread (a sequence of related 
messages). There is no need for such capabilities here. 
Consider the following possibilities. 

Messages from a particular user can be specified by 
including in a query a term such as "from:jones." This will 
locate only messages from a particular user. You can select 
messages of a particular "thread" by using the "view dis- 
cussion" option of the user interface described below. As 
stated above, messages for a particular date range can be 
specified in the query. 

Filtering Messages 

In order to facilitate mail handling, particularly for some- 
one receiving a large amount of e-mail, a user can configure 
the filter 280 to his or her own preferences as shown in FIG. 
9. A message filter is specified as one or more name "filter" 
queries 910. The named query 910 is stored as part of the 
account information of FIG. 3. The named filter query 910 
can be composed on a client computer 920 using the client 
mail application programs down-loaded from the mail ser- 
vice system 200. 

New messages 930 received by the mail service system 
200 are stored, parsed, and indexed in the message files 400 
and full-text index 500 as described above. In addition, each 
new message 930 can be compared with the named queries 
910. If the content of a new message 930 does not match any 
of the named filter queries 910, then the new message 930 
is given the inbox label 710 and the unread label 720, i.e., 
the message is placed in the "In-box" 940 for the user's 
attention. Otherwise, the new message 920 is only given the 
unread label 720. 

For example, mail which is sent out typically has a "from" 
field including the name of the sender, e.g., "From: Jon 
Doe," in the message header. Alternatively, the body of the 
mail message may include the text, "You are getting this 
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message from your good friend Jon Doe." The user Jon Doe 
can set up a named filter query "SentByME" as "From near 
(Jon Doe)". This query will match any message which 
contains the word "from" near the word phrase "Jon Doe." 

5 The effect is that users do not explicitly become aware of 
messages that match on the filter query 910. For example, a 
user may want to filter messages which are "cc" copies to 
one self. A user may also desire to filter out junk e-mail 
messages arriving from commercial e-mail distributors at 

10 known domains, or prc-sort messages received via mailing 
lists. 

Message Display Options 

From the user's perspective, access to the mail services is 
1 implemented by extensions to the Web browser, such as Java 
applets. Messages are normally displayed by their primary 
component being transmitted to the client in the HTML 
format, and being displayed in the Java applet's window. 
The first line of a displayed message contains any "hot- 
20 links" which the user can click to display the message in one 
of the Web browser's windows, either with the HTML 
formatting, or as the original text uninterpreted by the 
system. 

25 It should be noted, headers in Internet messages, depend- 
ing on routing, can be quite lengthy. Therefore, it is possible 
to restrict the view to just the "from," "to," "cc," "date," and 
"subject" fields of the header. 

Embedded Links 

30 

When displaying retrieved messages, the system 200 
heuristically locates text strings which have the syntax of 
e-mail addresses. If the user click on one of these addresses, 
then the system will display a composition window, 

35 described below, so that the user can easily generate a reply 
message to the selected e-mail address(es). 

Similarly, when displaying retrieved messages, the sys- 
tem 200 heuristically locates text strings that have the syntax 
of an URL, and makes the string a hot-link. When the user 

40 clicks on the hot-link, the URL is passed to the browser, 
which will retrieve the contents over the network, and 
process the content in the normal manner. 

The system also attempts to detect components in 

45 messages, such as explicitly "attached" or implicitly 
"embedded" files. The files can be in any number of possible 
formats. The content of these files are displayed by the 
browser 115. The specific display actions used will depend 
on how the browser is configured to respond to different 

5Q component file formats. 

For some file formats, for example GIF and JPEG, the 
component can directly be displayed. It is also possible to 
configure the browser with a "helper" applet to "display" 
attached files having specific format types as "icons." For 

55 example, the message may be in the form of an audio 
message, in which case, the message needs to be "said," and 
not displayed. For some message formats, the browser may 
store some of the content in file system of the client 
computer. 

60 Low- Bandwidth Filtering 

Since the client computers 111-113 may access the mail 
service system via low-bandwidth network connections, an 
attempt is made to minimize the amount of data that are sent 
65 from the mail service system to the client computers. Even 
over high-speed communications channels, minimizing the 
amount of network traffic can improve user interactions. 
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Because the mail service system 200 allows mail mes- 
sages to include attached or embedded multi-media files, 
mail messages can become quite large. In the prior art, the 
entire mail message, included files are typically shipped to 
the client computer. Thus, any part of the mail message can 5 
immediately be read by the user after the message has been 
received in the client. 

As shown in FIG. 10, the mail service system 200 can 
recognize messages components that are included as such. 
The system 200 can discover an explicitly attached file 1010 10 
to a message 1000, and the system 200 can also heuristically 
discover textual components 1020-1021 that are implicitly 
embedded without MIME structuring in the message. For 
example, the system 200 can recognize embedded "uuen- 
coded" enclosures, base 64 enclosures, Postscript (and PDF) 15 
documents, HTML pages, and MIME fragments. 

Accordingly, the system 200 is configured to "hold-back" 
such components 1010, 1020-1021 encoded in different 
formats using a "MIME" filter 1001. The attached and 
embedded components are replaced by hot-links 1031 in a 20 
reduced size message 1030. Only when the user clicks on 
one of the hot-links 1031 is the component sent to the 
requesting client computer. 

Client Computer User Interface 25 

The following sections described how the Web browser 
115 is configured to provided the e-mail services of the 
system 200. The functions described can be displayed as 
pull-down menus, or as button bars depending on a desired 
appearance. Preferably, the functions are implemented as 30 
Java applets. 

File Menu 

The file menu has the following options, Administration, 
Preferences, and Quit. If the user clicks on the Administra- 35 
tion option button, then the system 200 loads the system 
administrative page into the browser 116. Using the Admin- 
istrative window, subject to access controls, the user can 
view and modify accounts, and view the server log files. The 
preferences option is used to modify user preferences 370. 40 
Quit returns to the main log-in window. 

Queries Menu 

This menu includes the View Discussion, Name Current 
Query, Forget Named Query, Exclude "deleted" Message, 45 
and Your Query Options. The View Discussion option issues 
a query for messages related to the currently selected mes- 
sage. Here, "related" means any messages which share 
approximately the same subject line, and/or being in reply to 
such a message, or messages linked by a common standard 50 
"RFC822" message ID. 

The Name Current Query allows a user to attach a text 
string to the current query. This causes the system 200 to 
place the query in the account for the user for subsequent 
use. The Forget Named Query option deletes a named query. 55 

The Excluded "deleted" message option omits from a 
query result all messages that have the deleted label. This is 
the default option. Clicking on this option changes the 
behavior of the system 200 to include, in response to a query, 
"deleted" messages. The Your Named Queries option dis- 60 
plays a particular user's set of named queries 340. Clicking 
on any of the displayed names issues the query. 

Labels Menu 

This menu includes the Record Label, and Forget Label 65 
options. These options respectively allow for the addition 
and removal of labels to and from the label state 330. 
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History Menu 

The client keeps a history of, for example, the last ten 
queries to allow for the reissue of queries. The options of this 
menu are Go Back, Redo Current Query, Go Forward, and 
The History List. Go Back reissues the query preceding to 
the current query. Redo reissues the current query. This 
option is useful to process messages which have recently 
arrived, or in the case where the user's actions have altered 
the messages files 400 in some other manner. Go Forward 
reissues the query following the current query. The History 
List displays all of the recently issued queries. Any query 
listed can be reissued by clicking on the query. 

Messages Menu 

Options here include: Select All, Select Unread, Select 
Read, Mark As Unread, Mark As Read, Add Labels, Remove 
Labels, and Use Built-in Viewer. The Select All option 
selects all messages which match the current query. The next 
two options respectively select message that do not, and do 
have the unread label. The following two options add and 
remove labels label to currently selected messages. 

The user interface normally displays a message by con- 
verting the message to an HTML format and presenting it to 
an HTML viewer which can either be in the browser's main 
window, or with a built-in viewer. The last option of the 
message menu selects the viewer. 

Help Menu 

The help options can be used to display informational 
pages on how to use the various features of the system. The 
help pages are down-loaded on demand into the client 
computer from the mail service system 200. 

Main Window Menu Bar 

This menu bar includes buttons for the following func- 
tions. The functions are enabled by clicking on the button. 
Add: This button is used to add a selected label to a message. 
Relabel: This button combines the functions of the unlabel 

and add functions. 
Delete: With this button, a deleted label is added to a 
message. 

Unlabel: Used to remove a single label mentioned in a query 

from a message. 
Next: Selects a next message. 
Prev: Selects a preceding message. 
Newmail: Issues a query for all message having the inbox 
label. 

Query: Presents a dialog to compose and issue a query. 
Message Display Button Bar 

This button bar is used to perform the following functions. 
Detach: Generate a new top-level window to display 

selected messages. 
Compose: Generate a window for composing new mail 
messages. 

Forward: This function sets up a window for composing a 
new message. A selected message is attached to the new 
message. The attached messages are forwarded without 
the need of down-loading the messages to the client 
computer. 

Reply To All: This function sets up a window for composing 
a new message with the same recipients as those in a 
selected message. 
Reply To Sender: Set up a window for composing a new 
message to the sender of a selected message. 
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Composition Window 

Access to the composition window is gained by clicking 
on the Compose, Forward, Reply, or Modify button, or by 
clicking on a "mail-to" hot link in a displayed message. 
Compose begins a new message, forward is used to send a 
previously received message to someone else, reply is to 
respond to a message, and modify allows on to change a 
message which has not yet been sent. The mail service 
allows a user to compose multiple messages at a time. ^ 

The text of a message is typed in using an available 
composition window, or generating a window if none are 
available. The exact form of the typing area of the compo- 
sition window depends on the nature of the windowing 
system used on a particular client computer. Typically, while 15 
typing the user can use short-cuts for editing actions such as 
cut, paste, copy, delete, undo, and so forth. 

Text portions from another message can be inserted by 
using the Insert Msg, or Quote Msg buttons. If an entire 
message is to be included, then the Forward button should 2 o 
be used. The message will not actually be posted until the 
send function is selected. While the message is being 
composed, it is periodically saved by the mail system. Thus, 
a composition session started using one client computer in 
an office, can easily be completed some time later using 25 
another computer. 

Send: Sends a message. Any attachments are included 
before sending the message. The user is notified of invalid 
recipients by a status message, and editing of the message 
can continue. Otherwise, the window is switched to 30 
read-only mode. 

Close: After a message has been sent, or the discard button 
is clicked, this button replaces the send button to allow 
one to close the composition window. 

Discard: This button is used to discard the message being 35 
composed, and switches the window to read-only. A user 
can then click the close or modify buttons. 

Modify: After a message has been successfully sent, or if the 
discard button has been clicked, this button appears in 
place of the discard button to allow the user to compose 40 
another message derived from the current message. 

Wrap: This function is used to limit the number of characters 
on any one line to eighty, as required by some mailing 
systems. 

Insert Msg: Replace the selected text with displayed text 45 

from a selected message. 
Quote Msg: Replace the selected text with displayed text 

from a selected message so that each line is preceded by 

the character. 

Having described a preferred embodiment of the 50 
invention, it will now become apparent to one skilled in the 
art that other embodiments incorporating its concepts may 
be used. It is felt therefore, that this embodiment should not 
be limited to the disclosed embodiment, but rather should be 
limited only by the spirit and the scope of the appended 55 
claims. 

We claim: 

1. A mail server, for use in conjunction with a plurality of 
client computers that have means for being coupled to the 
mail server via a network, the client computers each includ- ^ 
ing a browser application for viewing documents sent by 
server computers, including the mail server; the mail server 
comprising: 

an index server for storing mail messages in message files, 
the mail messages received on behalf of users, the 65 
index server storing the mail messages in messages 
files and storing a full-text index of the mail messages, 
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the full-text index containing location information for 
all words in the mail messages; 

a mail access module for receiving and servicing a mail 
access request from one of the client computers, the 
client computer being operated on behalf of a particular 
one of the users, the mail access request including a 
query specifying one or more words associated with 
mail messages sought by the particular user, the mail 
access module including mail retrieval means for uti- 
lizing the full-text index to identify mail messages, if 
any, satisfying the query and sending a browser view- 
able document to the one client computer, the docu- 
ment containing information representing the identified 
mail messages; 

wherein 

the mail access module includes label handling means 
for adding user-defined and system predefined labels 
to the mail messages, removing labels from the mail 
messages, and storing the labels in the full-text 
index; 

the query specifies at least one user-defined label to be 
included in mail messages satisfying the query. 

2. The mail server of claim 1, including 

means for downloading client mail application programs 
to the client computers, wherein the client mail appli- 
cation programs are configured for execution in con- 
junction with the browser application on each of the 
client computers; 

wherein: 

label state is associated with the particular user, the 
label state including a set of user-defined labels 
being used by the particular user; and 

the mail server includes: 
means for storing the label state for the particular 
user; and 

means for downloading the label state of the par- 
ticular user to the one client computer, for use in 
conjunction with the downloaded client mail 
application programs. 

3. The mail server of claim 2, including 

an account manager that maintains for each user account 
information, includes a user password, the label state, 
saved queries, and user preferences. 

4. The mail server of claim 1, including 

means for downloading client mail application programs 
to the client computers, wherein the client mail appli- 
cation programs are configured for execution in con- 
junction with the browser application on each of the 
client computers; 

wherein: 

the client mail application programs include means for 
composing the query and sending the query in the 
mail access request from the one client computer to 
the mail server. 

5. A method of operating a mail server, the mail server 
operating in conjunction with a plurality of client computers 
that have means for being coupled to the mail server via a 
network, the client computers each including a browser 
application for viewing documents sent by server 
computers, including the mail server; the method compris- 
ing: 

receiving mail messages on behalf of clients, and storing 

the mail messages in message files; 
indexing all words in the mail messages, and storing a 

full-text index of the mail messages, the full-text index 

containing location information for all words in the 

mail messages; 
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receiving and servicing a mail access request from one of 
the client computers, the client computer being oper- 
ated on behalf of a particular one of the users, the mail 
access request including a query specifying one or 
more words associated with mail messages sought by 5 
the particular user; 

the servicing step including utilizing the full-text index to 
identify mail messages, if any, satisfying the query and 
sending a browser viewable document to the one client 
computer, the document containing information repre- 30 
senting the identified mail messages; 

adding user-defined and system predefined labels to the 
mail messages, removing labels from the mail 
messages, and storing the labels in the full-text index; 

wherein the query specifies at least one user-defined label 
to be included in mail messages satisfying the query. 

6. The method of claim 5, wherein: 

a label state is associated with the particular user, the label 
state including a set of user-defined labels being used 2 o 
by the particular user; and 
the method further includes: 
downloading client mail application programs from the 
mail server to the client computers, wherein the 
client mail application programs are configured for 25 
execution in conjunction with the browser applica- 
tion on each of the client computers; 
storing the label state for the particular user; and 
downloading the label state of the particular user to the 
one client computer, for use in conjunction with the 30 
downloaded client mail application programs. 

7. The method of claim 6, including 

maintaining account information for each user, the 
account information including a user password, the 
label state, saved queries, and user preferences. 35 

8. The method of claim 5, including 

downloading client mail application programs from the 
mail server to the client computers, wherein the client 
mail application programs are configured for execution 
in conjunction with the browser application on each of 40 
the client computers; 

wherein: 

the client mail application programs include means for 
composing the query and sending the query in the 45 
mail access request from the one client computer to 
the mail server. 

9. A computer program product for use in conjunction 
with a computer system functioning as a mail server, the 
mail server operating in conjunction with a plurality of client 5Q 
computers that have means for being coupled to the mail 
server via a network, the client computers each including a 
browser application for viewing documents sent by server 
computers, including the mail server; the computer program 
product comprising a computer readable storage medium 55 
and a computer program mechanism embedded therein, the 
computer program mechanism comprising: 

mail storage and indexing instructions for storing mail 
messages in message files, the mail messages received 
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on behalf of users, indexing all words in the mail 
messages, and storing a full-text index of the mail 
messages, the full-text index containing location infor- 
mation for all words in the mail messages; 

a mail access module for receiving and servicing a mail 
access request from one of the client computers, the 
client computer being operated on behalf of a particular 
one of the users, the mail access request including a 
query specifying one or more words associated with 
mail messages sought by the particular user, the mail 
access module including mail retrieval instructions for 
utilizing the full-text index to identify mail messages, 
if any, satisfying the query and sending a browser 
viewable document to the one client computer, the 
document containing information representing the 
identified mail messages; 

wherein 

the mail access module includes label handling instruc- 
tions for adding user-defined and system predefined 
labels to the mail messages, removing labels from 
the mail messages, and storing the labels in the 
full-text index; 

the query specifies at least one user-defined label to be 
included in mail messages satisfying the query. 

10. The computer program product of claim 9, including 
client mail application programs for downloading from 

the mail server to the client computers, wherein the 
client mail application programs are configured for 
execution in conjunction with the browser application 
on each of the client computers; 
wherein: 

a label state is associated with the particular user, the 
label state including a set of user-defined labels 
being used by the particular user; and 

the computer program product includes instructions 
for: 

storing the label state for the particular user; and 
downloading the label state of the particular user to 
the one client computer, for use in conjunction 
with the downloaded client mail application pro- 
grams. 

11. The computer program product of claim 10, including 
an account manager that maintains for each user account 

information, includes a user password, the label state, 
saved queries, and user preferences. 

12. The computer program product of claim 9, including 
client mail application programs for downloading from 

the mail server to the client computers, wherein the 
client mail application programs are configured for 
execution in conjunction with the browser application 
on each of the client computers; 
wherein: 

the client mail application programs include means for 
composing the query and sending the query in the 
mail access request from the one client computer to 
the mail server. 

***** 
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